knitr::opts_chunk$set(
  warning = FALSE, message = FALSE,echo = TRUE, results = "hide", tidy = TRUE)

A1 : Data Exploration, normalization, and mapping

Basic information about my dataset

Platform title:Illumina HiSeq 2000 (Homo sapiens)

Submission data:Nov 02 2010

Last update data:Mar 27 2019

Organism:Mar 27 2019

Number of GEO datasets that use this technology:9316

Number of GEO samples that use this technology: 160195

# get supplementary file
seq_files = getGEOSuppFiles("GSE152201")
file_name = rownames(seq_files)
file_name
exp_data = read.delim(file_name[1], header = TRUE, check.names = FALSE)
head(exp_data)

Sample Data table

DT::datatable(data.frame(exp_data[1:5, c(2, 3, 6, 9, 12)]), options = list(scrollX = T),
    caption = "Table 1 :My data sample")

(Xie, Cheng, and Tan 2022)

Filter low count

# select the mda cell line data
exp_data_mda <- exp_data[, c(1, 2, 9:14)]
# converts it to count per million
count_per_million_mda = cpm(exp_data[, 3:8])
# change the row name
rownames(count_per_million_mda) <- exp_data_mda[, "gene_id"]
# keep cpm greater than 3 per row
keep = rowSums(count_per_million_mda > 1) >= 3
exp_data_filtered = exp_data_mda[keep, ]
summarised_gene_count <- sort(table(exp_data_filtered$gene_id), decreasing = TRUE)
# sort it in decreasing order
summarised_gene_count_unfiltered <- sort(table(exp_data_mda$gene_id), decreasing = TRUE)
# show an example

knitr::kable(summarised_gene_count[which(summarised_gene_count > 1)][1:6], format = "html")
dim(exp_data_mda)
# how many genes are keptn
low_count_filtered <- nrow(exp_data_filtered)

Visulize distribution after Normalization

# filter outlier using normalization
filtered_data_matrix <- as.matrix(exp_data_filtered[, 3:8])
# change the row names
rownames(filtered_data_matrix) <- exp_data_filtered$gene_id
# create a DGEList for normalization
d = DGEList(counts = filtered_data_matrix, group = samples$treatment)
# calculate normalization factor
d_TMM = calcNormFactors(d)
normalized_counts <- cpm(d_TMM)
# number of outlier filtered
nrow(exp_data_filtered) - nrow(normalized_counts)

(Chen, Lun, and Smyth 2016)

Dispersion

Small variation as indicated by the graph

(Chen, Lun, and Smyth 2016)

Identifier mapping

# map my ensembl id to hgnc symbol using the correct mart and biomart filter,
conversion_mda <- "mda_cell_line.rds"
if (file.exists(conversion_mda)) {
    mda_cell_line <- readRDS(conversion_mda)
} else {
    mda_cell_line <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id",
        value = exp_data_filtered$gene_id, mart = ensembl)
    saveRDS(mda_cell_line, conversion_mda)
}

(Durinck et al. 2009)

## [1] 15746
## [1] 16953
# merge to include rest of the data
normalized_counts_merge <- merge(mda_cell_line, normalized_counts, by.x = 1, by.y = 0,
    all.y = TRUE)
# how many are missing
kable(normalized_counts_merge[1:5, 1:5], type = "html", caption = "Table 3 : missing hgnc symbols")

ensembl_id_missing_gene <- normalized_counts_merge$ensembl_gene_id[which(is.na(normalized_counts_merge$hgnc_symbol))]  #1207
no_symbol = length(ensembl_id_missing_gene)
# old hgnc from my original dataset
old_mapping <- merge(exp_data_filtered[, 1:2], data.frame(ensembl_id_missing_gene),
    by.x = 2, by.y = 1)
# some of the old hgnc symbol
kable(old_mapping[1:10, ], type = "html", caption = " Table 4: unmapped ensembl id  found in original dataset")
# how many ribosomal protein there are
count = nrow(old_mapping[grep(old_mapping$gene_name, pattern = "^RP"), ])

I merge the data set and 1207 observations are missing, 258are ribosomal genes but it is not really important for this dataset. The coverage now around 95%

# missing hgnc symbol
missing_ids_subset <- normalized_counts_merge[which(is.na(normalized_counts_merge$hgnc_symbol)),
    ]
# merge the missing symbol with old mapping
missing_ids_subset_withids <- merge(old_mapping, missing_ids_subset, by.x = 1, by.y = 1)
# filter out the NAs caused by the merge
missing_ids_subset_withids <- missing_ids_subset_withids[-3]
# set the same column names
colnames(missing_ids_subset_withids)[1:2] <- colnames(normalized_counts_merge)
# bind the old mapping with the rows that have missing hgnc symbol
finalized_normalized_counts <- rbind(normalized_counts_merge[which(!is.na(normalized_counts_merge$hgnc_symbol)),
    ], missing_ids_subset_withids)
# drop some NAs
finalized_normalized_counts <- normalized_counts_merge %>%
    drop_na(.)
# check and filter duplicated ensembl id
length(unique(finalized_normalized_counts$ensembl_gene_id))
length(unique(finalized_normalized_counts$hgnc_symbol))
finalized_normalized_counts <- distinct(finalized_normalized_counts, ensembl_gene_id,
    .keep_all = TRUE)
any(is.na(finalized_normalized_counts$hgnc_symbol))


# write.table(finalized_normalized_counts, file = file.path(getwd(),'Data',
# 'GSE152201_finalized_normalized_counts_mda_2022.txt'), sep = '\t')

Decription of my data

In the last assignment, I was working with gene expression data pulling from Geo with id GSE152201. This dataset describes the experiment done for two breast cancer cell lines; HCC70 and MDA. HCC70 is a basal cell line while MDA is a meschemyal cell line. Each cell line was treated with a control(ETOH), and a treatment(DEX) with three replicates each. DEX is a inducer of GR, the experiment is trying to find out what genes are upreguated by this inducer. Complemented by other experiments, they ultimately wanted to show that GR coordinates with STAT3 to active gene expression signature for basal-like triple-negative breast cancer.(Conway et al. 2020)

For this dataset,I filtered the gene that has count per million less than 3 for each cell line, normalizing by treatment using EdgeR. I also mapped the ensembl gene id to HGNC symbol using Biomart and merged the HGNC symbol that was originally existed in my dataset to the unmapped HGNC symbol.The final coverage above 95%

Since this data contains two cell lines, I am just going to perform one differential expression for the MDA cell line.

knitr::opts_chunk$set(warning = FALSE, message = FALSE, echo = TRUE, results = "hide",
    tidy = TRUE)

A2 : Differentail Gene Expression

Set up

All packages references in A2 Isserlin (2022b)

# select the normalized data
heatmap_matrix <- finalized_normalized_counts[, 3:8]
# rename the column and row names
rownames(heatmap_matrix) <- normalized_count_data$ensembl_gene_id
colnames(heatmap_matrix)
colnames(heatmap_matrix) <- colnames(normalized_count_data[, 3:8])

First scaled Heatmap

# row normalized the data
heatmap_matrix <- t(scale(t(heatmap_matrix)))
# some NAs, could be due to 0 variance
heatmap_matrix <- data.frame(heatmap_matrix)
# store 104 NAs
MDA_heatmap_missing <- heatmap_matrix %>%
    filter_all(any_vars(is.na(.)))
# filter 104 NAs
heatmap_matrix <- heatmap_matrix %>%
    drop_na(.)

# set up the color gradient
if (min(heatmap_matrix) == 0) {
    heatmap_col = colorRamp2(c(0, max(heatmap_matrix)), c("white", "red"))
} else {
    heatmap_col = colorRamp2(c(min(heatmap_matrix), 0, max(heatmap_matrix)), c("blue",
        "white", "red"))
}
# plot a heatmap
current_heatmap <- Heatmap(as.matrix(heatmap_matrix), show_row_dend = TRUE, show_column_dend = TRUE,
    col = heatmap_col, show_column_names = TRUE, show_row_names = FALSE, show_heatmap_legend = TRUE)
current_heatmap

This heat map show there is some separation of signals but is not really clear. (Gu, Eils, and Schlesner 2016; Gu et al. 2014; Isserlin 2022d)

T test

Since DEX is an inducer of the GR(NR3C1) gene. I would assume it would be significantly up-regulated. Here I ran a t-test to test this and it is significant

# get the column indexes for control and treatment group
etoh_samples <- grep(colnames(normalized_count_data), pattern = "EtoH")
dex_samples <- grep(colnames(normalized_count_data), pattern = "DEX")

# Find gene of interest, GR(NR3C1)
gr_gene_of_interest <- which(normalized_count_data$hgnc_symbol == "NR3C1")

# separate the data into control and treatment sample based on the given column
# indexes
gr_etoh_samples <- t(normalized_count_data[gr_gene_of_interest, etoh_samples])
colnames(gr_etoh_samples) <- c("etoh_samples")
# gr_etoh_samples

gr_dex_samples <- t(normalized_count_data[gr_gene_of_interest, dex_samples])
colnames(gr_dex_samples) <- c("dex_samples")
# gr_dex_samples

# Run a t test t.test(x=t(gr_etoh_samples),y=t(gr_dex_samples))
sjPlot::tab_model(t.test(x = t(gr_etoh_samples), y = t(gr_dex_samples)))
  Dependent variable
estimate CI p
39.17 15.24 – 63.10 0.018

Grouping

# extract useful information from the column names
samples <- data.frame(lapply(colnames(normalized_count_data)[3:8], FUN = function(x) {
    unlist(strsplit(x, split = "_"))[c(1, 2, 4)]
}))  #?
colnames(samples) <- colnames(normalized_count_data)[3:8]
rownames(samples) <- c("cell_line ", "treatments", "replicates")
samples <- data.frame(t(samples))
knitr::kable(samples[1:6, ], output = "html")
cell_line. treatments replicates
MDA231_DEX_4hr_rep1_SL35876 MDA231 DEX rep1
MDA231_DEX_4hr_rep2_SL35877 MDA231 DEX rep2
MDA231_DEX_4hr_rep3_SL35878 MDA231 DEX rep3
MDA231_EtoH_4hr_rep1_SL35879 MDA231 EtoH rep1
MDA231_EtoH_4hr_rep2_SL35880 MDA231 EtoH rep2
MDA231_EtoH_4hr_rep3_SL35881 MDA231 EtoH rep3

(Xie 2021)

PCA

The position of control(ETOH) and treatment(DEX) samples on the PCA plot is mostly apart.So I would use a simple model to fit

# 2D representation of my data, serve as reference for model building later
plotMDS(heatmap_matrix, col = c(rep("blue", 3), rep("red", 3)))
legend("topright", c("ETOH", "DEX"), pch = 19, fill = c("blue", "red"))
title(main = " Figure 2 : Principle Component Analysis Plot", sub = "MAD cell line")

From the plot we can see there is clear seperation of the two groups, so I will fit a simple model Isserlin (2022d)

Simple Model

# create a design matrix
model_design <- model.matrix(~samples$treatments)
knitr::kable(model_design[1:5, ], type = "html")
# convert dataframe to matrix
expressionMatrix <- as.matrix(normalized_count_data[, 3:8])
# assign row names and column names
rownames(expressionMatrix) <- normalized_count_data$ensembl_gene_id
colnames(expressionMatrix) <- colnames(normalized_count_data)[3:8]
# create a exprssion set
minimalSet <- ExpressionSet(assayData = expressionMatrix)
# fit the model
fit <- lmFit(minimalSet, model_design)
# use empirical bays to find genes that are differentially expressed trend =
# TRUE is specific for RNA-seq data
fit2 <- eBayes(fit, trend = TRUE)
# get the top-hits after correction for multiple hypothesis testing
topfit <- topTable(fit2, coef = ncol(model_design), adjust.method = "BH", number = nrow(expressionMatrix))
# merge the hgnc symbol by ensembl id
output_hits <- merge(normalized_count_data[, 1:2], topfit, by.y = 0, by.x = 1, all.y = TRUE)
# order the p value from high to low
output_hits <- output_hits[order(output_hits$P.Value), ]
knitr::kable(output_hits[1:10, 2:8], type = "html", row.names = FALSE, caption = " simple model fit result")
# number of genes pass un-adjusted threshold
num_p_val <- length(which(output_hits$P.Value < 0.05))
# number of genes pass adjusted threshold
num_adj_p <- length(which(output_hits$adj.P.Val < 0.05))
# get the fit for my gene of interest
gr <- output_hits[which(output_hits$hgnc_symbol == "NR3C1"), ]


DT::datatable(gr, option = list(scrollX = T), caption = "Table 3: model fit result for GR gene")
simple model fit result
hgnc_symbol logFC AveExpr t P.Value adj.P.Val B
NUDT16 -142.384330 105.054512 -60.48066 0 1.80e-05 1.443916
SCNN1G -5.687960 2.843980 -57.20976 0 1.80e-05 1.439396
TSC22D3 -169.955077 91.069378 -56.26152 0 1.80e-05 1.437938
NT5DC3 -160.940736 147.873895 -55.90880 0 1.80e-05 1.437377
MT1E -157.968254 95.207152 -54.25806 0 1.80e-05 1.434605
MT1X -6.336576 4.173485 -54.18148 0 1.80e-05 1.434470
BIRC3 -219.243719 133.252984 -48.43044 0 2.63e-05 1.422484
PXYLP1 -15.922962 9.317015 -47.48893 0 2.63e-05 1.420103
MT2A -208.178459 156.147866 -47.13680 0 2.63e-05 1.419176
FKBP5 -1137.546636 629.001346 -42.91191 0 3.54e-05 1.406250

(Xie, Cheng, and Tan 2022; Huber et al. 2015; Ritchie et al. 2015; Isserlin 2022d)

Q1 and Q2

There are num_p_val genes pass threshold (p value < 0.05) There are num_adj_p genes pass correction ( adjust p value <0.05) I used Benjamini-Hochberg(BH) for multiple hypothesis testing correction because BH is not as strict as bonferroni so I would not get all my genes filtered out and it correct false positives. I use 0.05 as my threshold because 0.05 is a general and common p value for statistically analysis. The purpose of this experiment is to identify all genes that are activated by the addition of DEX. If I do not corrected for mulitple hypothesis testing, there are 4000 more genes up-regulated because of the DEX. This seems a lot to me so I would use the adjusted p value with a threshold of 0.05

Volcano plot

Q3

# create a new column to show genes after correction that are up-regulated,
# down-regulated, and no change

output_hits$diffexpressed <- "no"
output_hits$diffexpressed[output_hits$logFC > 0 & output_hits$adj.P.Val < 0.05] <- "up"
output_hits$diffexpressed[output_hits$logFC < 0 & output_hits$adj.P.Val < 0.05] <- "down"
output_hits$diffexpressed[output_hits$hgnc_symbol == "NR3C1"] <- "NR3C1"

# use for plotting to allow my gene of interest to be on the top layer
df_layer_2 <- output_hits[output_hits$diffexpressed == "NR3C1", ]

# plot a volcano plot
ggplot() + geom_point(data = output_hits, aes(x = logFC, y = -log10(adj.P.Val), col = diffexpressed)) +
    geom_point(data = df_layer_2, aes(x = logFC, y = -log10(adj.P.Val), col = "NR3C1")) +
    theme_minimal() + xlim(-50, 50) + labs(title = " Figure 2: Volcano plot for differential expression")

(Wickham 2016)

Heatmap for genes with p value < 0.05

# filter the significant hits
top_hits <- output_hits$ensembl_gene_id[output_hits$P.Value < 0.05]
# create a heatmap matrix for genes that are significant
heatmap_matrix_tophits <- t(scale(t(heatmap_matrix[which(rownames(heatmap_matrix) %in%
    top_hits), ])))
# set up the color gradient
if (min(heatmap_matrix_tophits) == 0) {
    heatmap_col = colorRamp2(c(0, max(heatmap_matrix_tophits)), c("white", "red"))
} else {
    heatmap_col = colorRamp2(c(min(heatmap_matrix_tophits), 0, max(heatmap_matrix_tophits)),
        c("blue", "white", "red"))
}
# plot a heatmap for only those genes that are significant
current_heatmap <- Heatmap(as.matrix(heatmap_matrix_tophits), cluster_rows = TRUE,
    cluster_columns = TRUE, show_row_dend = TRUE, show_column_dend = TRUE, col = heatmap_col,
    show_column_names = TRUE, show_row_names = FALSE, show_heatmap_legend = TRUE,
    )
current_heatmap

(Gu, Eils, and Schlesner 2016; Gu et al. 2014; Isserlin 2022d)

Q4

After we select for genes that are significant, the heatmap shows a more clear picture of the experiment. This is because the experiment might be well-performed to eliminate any variation that is not resulted from the manipulation. It is also shown by the PCA plot that the control and treatment is well-separated. Gene sets are up-regulated in the control sample are down-regulated in the treatment group. Gene sets are down-regulated in the control sample are up-regulated in the treatment group.

Ordered heatmap for genes with adjusted p value < 0.05

let us not cluster the genes but group controls and experiments

top_hits <- output_hits$ensembl_gene_id[output_hits$adj.P.Val < 0.05]
heatmap_matrix_tophits <- t(scale(t(heatmap_matrix[which(rownames(heatmap_matrix) %in%
    top_hits), ])))
if (min(heatmap_matrix_tophits) == 0) {
    heatmap_col = colorRamp2(c(0, max(heatmap_matrix_tophits)), c("white", "red"))
} else {
    heatmap_col = colorRamp2(c(min(heatmap_matrix_tophits), 0, max(heatmap_matrix_tophits)),
        c("blue", "white", "red"))
}


# plot heatmap that orders columns based on what I have given (do not cluster
# columns)
current_heatmap <- Heatmap(as.matrix(heatmap_matrix_tophits), cluster_rows = TRUE,
    cluster_columns = TRUE, show_row_dend = TRUE, show_column_dend = FALSE, col = heatmap_col,
    show_column_names = TRUE, show_row_names = FALSE, show_heatmap_legend = TRUE,
    )
current_heatmap

(Gu, Eils, and Schlesner 2016; Gu et al. 2014; Isserlin 2022d)

A2: Threshold and over-representation analysis

filter duplicated hgnc symbol

# get hgnc symbol for significant genes after correction that are up-regulated
# and down-regulated
duplicate_num <- length(output_hits$hgnc_symbol) - length(unique(output_hits$hgnc_symbol))
# filter duplicated genes
output_hits <- distinct(output_hits, hgnc_symbol, .keep_all = TRUE)

There a 1282 duplicated hgnc id in my file, those are likely due to duplicated hgnc id I kept for my first assignment.

Enrichment analysis using gprofiler2

Q1

I decided to use the gprofiler2 package for enrichment analysis becuase it would be cohesive with my rest of code and gprofiler2 actually come with some nice graph I used Benjamini-Hochberg(BH) for multiple hypothesis testing correction there are thousand of pathway. BH is not as strict as bonferroni so I would not get all pathway out and it corrects false positives.

Q2

I choose GO biological processes , Reactome and Wikipathway.I choose those becuase I am interested in biological pathway annotation those three are popular and free. I want my initial analysis to be high quality so I exclude electronic version of GO. I am using the following version as shown in the gprofiler web server-Data sources - Show data versions GO:BP – annotations: BioMart, classes:releases/2021-12-15 REAC – annotations: BioMart, classes: 2022-1-3 WP – 2021-12-10

Some code in this section references a publication about gprofiler2(Kolberg et al. 2020)

# up-regulated subset
up_subset <- output_hits[which(output_hits$adj.P.Val < 0.05 & output_hits$logFC >
    0), ]
# down-regulated subset
down_subset <- output_hits[which(output_hits$adj.P.Val < 0.05 & output_hits$logFC <
    0), ]
# whole list
whole <- output_hits[which(output_hits$adj.P.Val < 0.05), ]
# enrichment analysis for up regulated subset
gp_up = gost(up_subset$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
    exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"))
# link to the gprofiler server
gp_up_link = gost(up_subset$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
    exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"), as_short_link = TRUE)
# filter based on term size and order by p value
gp_up$result -> df_up
df_up <- df_up[which(df_up$term_size < 200), ]
df_up_ordered <- df_up[order(df_up$p_value), ]


up_genesets <- nrow(df_up)

# construct a table about enrichment results
DT::datatable(df_up_ordered[, c(2, 3, 4, 9, 10, 11)], caption = " Enrichment analysis for up-regulated genes",
    options = list(scrollX = T))
# plot the enrichment result
p1 <- gostplot(gp_up, interactive = FALSE)
# highlight top ten hits
publish_gostplot(p1, highlight_terms = df_up_ordered$term_id[1:10])
Figure 5: Enrichment plot for up-regulated genes

Figure 5: Enrichment plot for up-regulated genes

Figure 5: Enrichment plot for up-regulated genes

Figure 5: Enrichment plot for up-regulated genes

# publish_gosttable(gp_up)
# for down regulated subset
gp_down = gost(down_subset$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
    exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"))
# link
gp_down_link = gost(down_subset$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
    exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"), as_short_link = TRUE)

# filter based on term size and order by p value
gp_down$result -> df_down
df_down <- df_down[which(df_down$term_size < 200), ]
df_down_ordered <- df_down[order(df_down$p_value), ]

down_genesets <- nrow(df_down)

# construct a table about enrichment results
DT::datatable(df_down_ordered[, c(2, 3, 4, 9, 10, 11)], caption = " Enrichment analysis for down-regulated genes",
    options = list(scrollX = T))
# plot the enrichment result
p2 <- gostplot(gp_down, interactive = FALSE)
# highlight top ten hits
publish_gostplot(p2, highlight_terms = df_down_ordered$term_id[1:10])
Figure 6: Enrichment plot for down-regulated genes

Figure 6: Enrichment plot for down-regulated genes

Figure 6: Enrichment plot for down-regulated genes

Figure 6: Enrichment plot for down-regulated genes

# for the whole set
gp_whole = gost(whole$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
    exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"))

gp_whole_link = gost(whole$hgnc_symbol, organism = "hsapiens", correction_method = "false_discovery_rate",
    exclude_iea = TRUE, sources = c("GO:BP", "REAC", "WP"), as_short_link = TRUE)

# filter based on term size and order by p vlaue
gp_whole$result -> df_whole
df_whole <- df_whole[which(df_whole$term_size < 200), ]
df_whole_ordered <- df_whole[order(df_whole$p_value), ]

whole_genesets <- nrow(df_whole)
# construct a table about enrichment results
DT::datatable(df_whole_ordered[, c(2, 3, 4, 9, 10, 11)], caption = " Enrichment analysis for down-regulated genes",
    options = list(scrollX = T))
# plot the enrichment result
p3 <- gostplot(gp_whole, interactive = FALSE)
# highlight top ten hits
publish_gostplot(p3, highlight_terms = df_whole_ordered$term_id[1:10])

Xie, Cheng, and Tan (2022)

Q3

For term size between 0 and 200, threshold of 0.05 after BH correction. There are 148 up-regulated genesets. There are 186 down-regulated genesets. There are a total 417 genesets using the whole list.

Q4

Comparing enrichment analysis result from using own up or down-regulated gene list versus using the whole list I see there are some difference in the top hit returned. Using the whole gene list gave us some new enrichment results such as MAPK cascade and miRNA transcription. MAPK cascade and miRNA all have regulatory role in breast cancer progression(Jiang et al. 2020 ; Loh et al. 2019). This is likely due the reason that some particular genes are highly expressed while other genes in the same geneset were not. However, in combination they gave a stronger signal.

Interpretation

Q1 and Q2

Yes, the enrichment analysis for up regulated genes supports the result of the paper that GR actives gene expression signature for basal-like triple-negative breast cancer. This is because among the 10 significant genesets in the enrichment analysis, NF-kB stimulates proliferation and stops programmed cell death in many cell types including basal breast cancer cells(Biswas et al. 2004 ). Also,receptor tyrosine kinases is an important signalling pathway that is involved in breast cancer(Hynes 2000)

Because MDA is a stem cell lins. There are someother top hits are related to stem cell characteristics. However, because here I just analyze differential expression after the induction of GR not STAT3, My analysis cannot confidently support the final conclusion of the paper that STAT3 and GR coordinates to active basal-like triple-negative breast cancer.

Citations

Biswas, Debajit K, Qian Shi, Shanon Baily, Ian Strickland, Sankar Ghosh, Arthur B Pardee, and J Dirk Iglehart. 2004. “NF-\(\kappa\)b Activation in Human Breast Cancer Specimens and Its Role in Cell Proliferation and Apoptosis.” Proceedings of the National Academy of Sciences 101 (27): 10137–42.
Chen, Yunshun, Aaron A T Lun, and Gordon K Smyth. 2016. “From Reads to Genes to Pathways: Differential Expression Analysis of RNA-Seq Experiments Using Rsubread and the edgeR Quasi-Likelihood Pipeline.” F1000Research 5: 1438. https://doi.org/10.12688/f1000research.8987.2.
Conway, Megan E, Joy M McDaniel, James M Graham, Katrin P Guillen, Patsy G Oliver, Stephanie L Parker, Peibin Yue, et al. 2020. “Stat3 and GR Cooperate to Drive Gene Expression and Growth of Basal-Like Triple-Negative Breast Cancer.” Cancer Research 80 (20): 4355–70.
Davis, Sean, and Paul Meltzer. 2007. “GEOquery: A Bridge Between the Gene Expression Omnibus (GEO) and BioConductor.” Bioinformatics 14: 1846–47.
Durinck, Steffen, Paul T. Spellman, Ewan Birney, and Wolfgang Huber. 2009. “Mapping Identifiers for the Integration of Genomic Datasets with the r/Bioconductor Package biomaRt.” Nature Protocols 4: 1184–91.
Gu, Zuguang, Roland Eils, and Matthias Schlesner. 2016. “Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data.” Bioinformatics.
Gu, Zuguang, Lei Gu, Roland Eils, Matthias Schlesner, and Benedikt Brors. 2014. “Circlize Implements and Enhances Circular Visualization in r.” Bioinformatics 30: 2811–12.
Huber, W., V. J. Carey, R. Gentleman, S. Anders, M. Carlson, B. S. Carvalho, H. C. Bravo, et al. 2015. Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2): 115–21. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.
Hynes, Nancy E. 2000. “Tyrosine Kinase Signalling in Breast Cancer.” Breast Cancer Research 2 (3): 1–4.
Isserlin, Ruth. 2022a. “Bcb420 - Computational Systems Biology Lecture. Lecture 5 - Data Exploration and Identifier Mapping.” https://q.utoronto.ca/courses/248455/files/19306012?module_item_id=3480061.
———. 2022b. “Bcb420 - Computational Systems Biology Lecture. Lecture 7 - Annotation Dataset and Intro to Pathway Analysis.” https://q.utoronto.ca/courses/248455/files/18120892?module_item_id=3210881.
———. 2022c. “Bcb420 - Computational Systems Biology. Lecture 4 - Exploring the Data and Basics of Normalization.” https://q.utoronto.ca/courses/248455/files/19273570?module_item_id=3476594.
———. 2022d. “Bcb420 - Computational Systems Biology. Lecture 6 - Differential Expression.” https://q.utoronto.ca/courses/248455/files/19309003?module_item_id=3480535.
Jiang, Weihua, Xiaowen Wang, Chenguang Zhang, Laiti Xue, and Liang Yang. 2020. “Expression and Clinical Significance of MAPK and EGFR in Triple-Negative Breast Cancer.” Oncology Letters 19 (3): 1842–48.
Kolberg, Liis, Uku Raudvere, Ivan Kuzmin, Jaak Vilo, and Hedi Peterson. 2020. “Gprofiler2–an r Package for Gene List Functional Enrichment Analysis and Namespace Conversion Toolset g: Profiler.” F1000Research 9.
Loh, Hui-Yi, Brendan P Norman, Kok-Song Lai, Nik Mohd Afizan Nik Abd Rahman, Noorjahan Banu Mohamed Alitheen, and Mohd Azuraidi Osman. 2019. “The Regulatory Role of microRNAs in Breast Cancer.” International Journal of Molecular Sciences 20 (19): 4940.
Morgan, Martin. 2021. BiocManager: Access the Bioconductor Project Package Repository. https://CRAN.R-project.org/package=BiocManager.
Ritchie, Matthew E, Belinda Phipson, Di Wu, Yifang Hu, Charity W Law, Wei Shi, and Gordon K Smyth. 2015. limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2021. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, Joe Cheng, and Xianying Tan. 2022. DT: A Wrapper of the JavaScript Library ’DataTables’. https://CRAN.R-project.org/package=DT.
Zhu, Yuelin, Sean Davis, Robert Stephens, Paul S. Meltzer, and Yidong Chen. 2008. “GEOmetadb: Powerful Alternative Search Engine for the Gene Expression Omnibus.” Bioinformatics (Oxford, England) 24 (23): 2798–2800. https://doi.org/10.1093/bioinformatics/btn520.